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10 gigabit attachment unit (XAUI XGS) 
Advanced Encryption Standard (AES) 
Advanced extensible Interface (AXI) 
Advanced High-performance Bus (AHB) 
Agile Mixed Signal (AMS) 

ARM Holdings Public Limited Company (ARM) 
Block random access memory (BRAM) 
Block triple modular redundancy (BTMR) 
Built-in-self-test (BIST) 

Cache Coherent Interconnect (CCI) 
Combinatorial logic (CL) 

Commercial off the shelf (COTS) 


Complementary metal-oxide semiconductor 
(CMOS) 


Computer aided design (CAD) 

Controller Area Network (CAN) 

Device under test (DUT) 

Digital Signal Processing (DSP) 

Direct Memory Access (DMA) 

Distributed triple modular redundancy (DTMR) 


Double Data Rate (DDR3 = Generation 3; DDR4 = 
Generation 4) 


Edge-triggered flip-flops (DFFs) 
Equipment Monitor And Control (EMAC) 
Error-Correcting Code (ECC) 

Field programmable gate array (FPGA) 
Floating Point Unit (FPU) 

General purpose input/output (GPIO) 
Global Industry Classification (GIC) 
Global triple modular redundancy (GTMR) 
Hardware description language (HDL) 
High Performance Input/Output (HPIO) 
High Pressure Sodium (HPS) 


Acronyms 


High Speed Bus Interface (PS-GTR) 

Input — output (I/O) 

Intellectual Property (IP) 

Inter-Integrated Circuit (I2C) 

Internal configuration access port (ICAP) 
Joint test action group (JTAG) 

Lightwatt High Pressure Sodium (LW HPS) 
Linear energy transfer (LET) 

Local triple modular redundancy (LTMR) 
Look up table (LUT) 

Low Power (LP) 

Low-Voltage Differential Signaling (LVDS) 
Memory Management Unit (MMU) 
Microprocessor (MP) 

Multi-die Interconnect Bridge (EMIB) 
MultiMediaCard (MMC) 

Multiport Front-End (MPFE) 

Not OR logic gate (NOR) 

Operational frequency (fs) 

Oscillator (RC OSC) 


Peripheral Component Interconnect Express 


(PCle) 

Personal Computer (PC) 
Phase locked loop (PLL) 
Phase Locked Loop (PLL) 
Physical layer (PHY) 


Physical medium attachment sub-layer (PMA) 


Power on reset (POR) 

Probability of flip-flop upset (PDFFSEU) 
Probability of logic masking (Plogic) 
Probability of transient generation (Pgen) 


Probability of transient propagation (Pprop) 
Processor (PC) 

Radiation Effects and Analysis Group (REAG) 
Radiation Tolerant (RT) 

Secondary Control Unit (SCU) 

Secure Digital (SD) 


Secure Digital embedded MultiMediaCard 
(SD/eMMC) 


Secure Digital Input/Output (SDIO) 

Serial Advanced Technology Attachment (SATA) 
Serial Peripheral Interface (SPI) 

Serial Quad Input/Output (QSPI) 
Serializer/deserializer (Serdes EPCS) 

Single event functional interrupt (SEFI) 
Single event latch-up (SEL) 

Single event transient (SET) 

Single event upset (SEU) 

Single event upset cross-section (oSEU) 
Spatial-Division-Multiplexing (SDM) 

Static random access memory (SRAM) 
System Memory Management Unit (SMMU) 
System on a chip (SOC) 

Transceiver Type (GTHIGTY) 

Transient width (twidth) 

Triple modular redundancy (TMR) 

Universal Asynchronous Receiver/Transmitter 
(UART) 

Universal synchronous Receiver/Transmitter 
(USRT) 


Universal Serial Bus (USB) 

Universal Serial Bus On-the-go (USB OTG) 
Watchdog Timer (WDT) 

Windowed Shift Register (WSR) 
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Problem Statement vasa 


e For many years, intellectual property (IP) cores 
have been incorporated into field programmable 
gate array (FPGA) and application specific 
integrated circuit (ASIC) design flows. 


¢ However, the usage of large complex IP cores 
were limited within products that required a high 
level of reliability. 


e This is no longer the case. IP core insertion has 
become mainstream ...including their use in 
highly reliable products. 

¢ Due to limited visibility and control, challenges 
exist when using IP cores and subsequently 
compromise product reliability. 
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IP Core Terminology Regarding FPGA 
Insertion 


e IP cores are blocks of logic elements: 
— Reduce Time-to-Market. 
— Eliminate Design Risks. 
— Reduce Development Costs. 


e IP cores can be “Soft” or “Hard.” 
— Terminology has nothing to do with radiation 
susceptibility. 
— Soft Core: IP logic blocks are implemented in the 


system programmable logic area (user area). They are 
generally flexible in order to meet user needs. 


— Hard Core: IP logic are embedded in the FPGA device. 
They have limited flexibility or none at all. 
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Microsem! RTG4 FPGA and Its 
Embedded IP Cores 


Microsemi’ RTG4”" FPGA 


Standard Cell/ 
» SEU Immune 


FlashBased/ 
SEU Immune 


Up to 24 Lanes, Multi Protocol 3.125 Gbps SERDES 


PMA PMA PMA PMA Figure does not 
Prone pa inthe a S h O WwW U S er 

Up to 2 Per Device programmable 

[oe XGMIl, Direct 20-Bit Bus / og ] C area. 


System Controller ee is ‘ Up to 16 SpaceWire Clock & 
ogic Elements > Data Recovery Circuits 


POR Generator 


Math Blocks Micro SRAM Large SRAM 
(18x18) (64x18) (1024x18) —,*, uPROM 
JTAG  §§$$§/T © © = 6 6 E a ea 
Poa 462 wr” 210 "209 
AXI/AHB 667 Mbps DDR 
RTPLLs Controller/PHY 
Math Blocks Micro SRAM Large SRAM 
(18x18) (64x18) (1024x18) AXI/AHB 667 Mbps DDR 
RC OSC Controller/PHY 


Multi-Standard GPIO 


Figure is courtesy of Microsemi 
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Soft IP Core Insertion Flow Sy 


HDL Hardware description language 


Soft IP can can tn the form of 
HDL or gate-level-netlists. 
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Pros of IP Core Insertion vasa 


e IP Cores are very easy to use. 


e As an example, a computer system 
can be designed in minutes by simply 
pressing buttons within a CAD tool. 


- Students are graduating with IP core >, 4 
insertion experience. | 


¢ Design development costs less: Micelle 
— Lots of complexity with very little effort. 
— Design cycle time. 


— Reusability reduces verification effort 
2222922? 


CAD computer aided design 


Optimize 


— Employees require less expertise, hence 
less of a paycheck. 

For complex, critical applications, the assumptions that IP 

cores will cost less can be a myth. 
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Cons of IP Core Insertion In Critical 
Applications 


e IP Cores have limited visibility: 
— Difficult to verify and manipulate. case ! 
— Design might not follow proper design 
rule protocol (but you will not know). 
¢ If mitigation is required, it can be 
compromised. 
¢ Design development costs less???: 


— Design cycle time can be elongated 
because selected user mode is not 
mainstream. Never used/tested before. 


¢ Reusability can be compromised: 


— Once an IP is custom configured, it is no longer “reusable logic.” 
For critical application standards, verification effort is increased. 


— Once an IP is inserted into a unique design it is no longer “reusable 
logic.” For critical application standards, verification effort is 
increased. 
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| Trade Tale 


Optimize 


Challenges: IP Core Insertion in Critical 


Applications 
¢ Beware — pushing a button on a CAD tool can be 
misleading. 
¢ Does the core follow proper synchronous design 
methodology? 


¢ How has the design been vetted and verified prior to 
your usage? 
e Research must be performed in order to understand if 
the IP can reliably be inserted into your design: 
— Timing characteristics — can the IP perform at the missions 
specified speed? 
— Can the IP core fit into the device with all other necessary 
logic? 
— Are the I/O of the IP compatible with your device or the other 
components you have in your device? 
— Does the IP require mitigation? 
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Challenges: IP Core Verification in So 


Critical Applications 
e Design reviews require design to be parsed by a 
team of specialists. 


— Some IP cores are so complex, they are close to impossible 
to parse. 


— Some IP cores are in gate-netlist form instead of HDL. They 
are also close to impossible to parse. 


— Some IP cores are locked and cannot be viewed by any 
individual. 

e Although datasheets are available, users will rely on 
IP core models and blind testing. 

¢ Point is, because of limited visibility and complexity, 
IP are hard to verify. 

e Enhanced verification techniques 
exist but still have limitations 
regarding black box (like) IP. 
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IP Core Mitigation in Critical Applications @&y 


Dual Redundancy (DR) and Triple Modular 
Redundancy (TMR) 


Stop, investigate, note limitations 
before pushing that CAD 
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Dual Redundant IP Cores vasa 


e There are no correction 
mechanisms. 


¢ Ifthe DR comparator detects a 
bad compare, the system stops 
and action is taken. 


e Pro: if designed correctly, the 
system can be masked from IP 
core failures. 


¢ Con: the probability of failure 
(hardware-reliability or single 
event upset (SEU)) is at least 
doubled. 


— Although the system can be masked, system availability is 
decreased. 


— Depending on the critical application, the reduction in availability 
can compromise adhering to mission requirements. 
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HDL: Hardware description language 


How To Insert TMR into A Design: 
FPGA User Design Flow TE MRte oni Gecnen 


a Ulaeat(eyarel into the HDL. 
Specification Generally not done 


because too 
difficult. 


Output of Synthesis 

synthesis isa TMR can be 
gate-level netlist ——--—-_ ==> > = [nserted during 
that represents synthesis or post 
the given HDL synthesis. 
function. 


If inserted post 
synthesis, the 
gate-level netlist is 
replicated, ripped 
apart, and voters + 
feedback are 
inserted. 
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Various TMR Schemes: Different Topologies Gy 


Block diagram of local 
TMR (LTMR): only flip- 
flops (DFFs) are 
triplicated and data- 
paths stay singular; 
voters are brought into 
the design and placed 
in front of the DFFs. 


Block diagram of block 
TMR (BTMR): a complex 
function containing 
combinatorial logic (CL) 
and flip-flops (DFFs) is 
triplicated as three 
black boxes; majority 
voters are placed at the 
outputs of the triplet. 


Block Diagram of 
distributed TMR (DTMR): 
the entire design is 
triplicated except for the 
global routes (e.g., clocks); 
voters are brought into the 
design and placed after the 
flip-flops (DFFs). DTMR 
masks and corrects mast 
single event upsets (SEUs). 


To be presented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 7-8, 2017 


IP Cores and Block Triple Modular 
Redundancy: BTMR al 


IP Core 


—=— Can Only 
—— Mask 
Errors 


3x the error rate with 
triplication and no 
correction/flushing 


e Need Feedback to Correct 
e Cannot apply internal correction from voted outputs 


¢ If blocks are not regularly flushed (e.g., reset), Errors 
can accumulate -— may not be an effective technique 
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Explanation of BTMR Strength and Weakness 
using Classical Reliability Models vasa 


e- At 3 @ 2A @ 3M 1/A (5/6 A)= 0.833/A 


Reliablity across Fluence: Simplex 
System versus BTMR Version 


pe Failures 


a —System No TMR MLS 
2 2, . —BTMR System 
fe 0.5 Operating a BTMR 
= design in this time 
® 0.4 
ae interval will provide 
; ™ an increase in 
0.2 reliability. 
0.1 However, over time, 
0 BTMR reliability drops 
0 5000 10000 off faster than a 
Minutes system with No TMR. 
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BTMR Bottom Line S& 


¢ How long does your BTMR system need to operate 
relative to the MTTF for one of its unmitigated 
blocks? 


¢ Over time, a BTMR system is less reliable than an 
unmitigated system. 


e Adding more replicated blocks (e.g., N-out-of-M) 
system will only increase the reliability during the 
short window near start time. However, overtime, 
the reliability of an N-out-of-M system will fall faster 
as M (the number of replicated blocks) grows. 

e Unfortunately BTMR is the most common means of 
TMR used with IP cores. Users are not getting the 
level of protection that they require. 
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SCS750 BTMR uPs(Maxwell) GAIA: 
Performance Is Lower than Assumed 


GAIA is an European Space Agency 


(Estes UUSSieln Ie MIB Ne goci ae elerel Radiation Tolerant | Reed Solomon anc 
our Galaxy FPGA 
System 


Controller, 
Memory 
controller, 
PCI, Timers, 
Interrupts, 
DMA, UART (2) 


PowerPC 


XO) (=) a aa OF 


TMR Mitigation 


XO) (=) a ea 


(1) The faulty uP is masked; 
(2) System stalls and then is 


PCI-PCI 
Bridge 
32 bit 33MHz PCI 


USRT Mili ary 
flushed; (2) Standard 1553 
(3) Bring system up with all UP poaiation Interfaces 


synchronized. tolerant EPGA Radiation tolerant 18 
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DTMR and LTMR Strategies Provide 
Correction and Hence Increase sal 
Availability and Reliability 


e Depending on the target FPGA, DTMR or LTMR can be 
suitable mitigation strategies: 


— LTMR for Microsemi FPGA products (Do not use in SRAM 
based FPGAs) 


— DTMR for SRAM based FPGA products (e.g., Xilinx). 

e Depending on your TMR insertion tool , some IP cores can 
have LTMR or DTMR inserted during the synthesis 
process. 

¢ Most tools are still having problems with TMR insertion 
into IP. This is another reason why BTMR Is so popular... 
it’s simple to implement. 

e Warning, there are some IP cores that are black boxes and 


no test can dasest dy MR OS BEME-ccount prior to IP 


selection. 
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